-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-4857] [CORE] Adds Executor membership events to SparkListener #3711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #24494 has started for PR 3711 at commit
|
|
Test build #24494 has finished for PR 3711 at commit
|
|
Test FAILed. |
b1f715d to
ab2575f
Compare
|
Test build #24496 has started for PR 3711 at commit
|
|
Test build #24496 has finished for PR 3711 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this class? Why not just store it in the events themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was so that we can keep the ExeuctorInfo for the internal akka message and the SparkListener in sync. The parent class, ExecutorInfo, contains all the info we want to share with the SparkListener and others and the ExecutorData child class add additional information like the akka address that should not be exposed. Adding this info to the events themselves is possible but we will lead to duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that this should be a trait/interface rather than a class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'll second @andrewor14's comment from the other PR: I think that the ExecutorInfo name might be potentially confusing since we already have an internal ExecutorInfo class. So, if you think of a better name maybe we could change it.
@andrewor14 Regarding whether we should have a separate class or not, you could imagine that the web UI might want to store some information on executors for later display / lookup. By storing the executor info outside of the message, we can just store the ExecutorInfo class. If the fields were part of the event, then we'd have to store the events themselves or re-pack the data into some new class that would be essentially the same as this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshRosen yes, we should probably make it a trait/interface instead of a class. I'll try and pick a better name than ExecutorInfo. I was trying to follow the TaskInfo pattern but yes it might be confusing since there is another class named ExecutorInfo.
|
Hi @ksakellis, This PR has some nice additions and looks pretty good overall. In particular, I like the addition of the I have one main piece of feedback: I think we need to add these new events to JSONProtocol / JSONProtocolSuite so that they're properly persisted in the event log. I think there's a bunch of examples of this in the code, so this should be fairly straightforward. You may end up having to write JSON serializers for ExecutorInfo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To re-iterate a style comment from Andrew: our style is to not place spaces before :, so this should be
executorInfo: ExecutorInfoab2575f to
6e06a79
Compare
|
@JoshRosen I actually couldn't make ExecutorInfo (now ExecutorDetails) a trait because the json protocol stuff requires a concrete object to deserialize events. The other option is to keep ExecutorDetails a trait and create a third object as a concrete implementation but that seemed to add unnecessary complexity. |
|
Test build #25060 has started for PR 3711 at commit
|
6e06a79 to
7b64ea0
Compare
|
Test build #25061 has started for PR 3711 at commit
|
|
Test build #25060 has finished for PR 3711 at commit
|
|
Test FAILed. |
|
This looks like a legitimate test failure. Ther AMPLab webserver is having some issues today, so here's a different link to reach the same test result: https://hadrian.ist.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25060/testReport/ |
7b64ea0 to
776d743
Compare
|
Yes it is. Not sure why I changed the #of cores between the two commits in the unit test - weird. Anyways. it has been fixed. |
|
Test build #25065 has started for PR 3711 at commit
|
|
Test build #25061 has finished for PR 3711 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about calling this JavaSparkListener - that's what we've tended to use in the past for things that were supposed to be "drop in" substitutes for Scala classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on JavaSparkListener. It's weird for the user to extend an "adapter"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for completeness (since I suggested the original name), listener/adapter is used extensively in the Java API (see java.awt.event, for example). An adapter is just a default implementation of a listener in Java-ese.
|
Test build #25065 has finished for PR 3711 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the executor removed event need to include executor details? The listener already listens for executor added events, which contain the same details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the event more self describing. Otherwise you'd need to also listen on the executorAdded event, have a hashmap then look it up in the executor removed method to find the details.
That being said, I don't really feel that strongly so we can remove the executorDetails structure if you are concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just kind of weird that you can't remove an executor without having full information about it. I think it's reasonable to expect the caller to listen for both the add and remove events.
|
Left a few relatively minor comments. Otherwise this LGTM. |
Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming. In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener.
We want to persist the executor added/removed events. Renamed ExecutorInfo to ExecutorDetails to avoid conflicting name. Also additional CR feedback.
This is to support a Mesos fine grained scheduler.
776d743 to
946d2c5
Compare
|
Test build #25188 has started for PR 3711 at commit
|
|
Test build #25188 has finished for PR 3711 at commit
|
|
Test PASSed. |
|
Ok, latest changes look fine to me. Any other comments @pwendell @JoshRosen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this into an integration test (the local-cluster mode is pretty expensive) could you modify LocalBackend to send an executor added event? Then you can still test with local mode, and also there will be less of an overall difference between the local and the cluster modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, but then that is not really testing much is it. The point of this test was to test that we are getting the onExecutorAdded event when an executor is added to the cluster. Modifying the LocalBacked to publish these events will greatly(in my opinion) reduce the usefulness of the test. We'd only be testing some of the SparkListener plumbing and not the actual creation of the events.
|
LGTM - just had a minor comment that can also be addressed on merge. Jenkins, test this please. |
|
Test build #25406 has started for PR 3711 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we are doing this here instead of in "executorLost" method of DAGScheduler.scala ? (similarly for SparkListenerExecutorAdded event above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ksakellis that seems like a good idea actually.
|
Test build #25406 has finished for PR 3711 at commit
|
|
Test PASSed. |
|
Actually, why not add these in the DAG Scheduler per @nitin2goyal's suggestion. |
|
@pwendell @nitin2goyal I had previously thought of adding it to the DAGScheduler - this would avoid the duplicated code between the CoarseGrainedSchedulerBackend and MesosSchedulerBackend. The reason I did not go with this approach is two fold: The consequence is that the events are generated at a lower level but I think they are more accurate. Thoughts? I'd really like to get this merged so that it can unblock: #3486 (Yarn links in UI) which i think is pretty important for usability. |
|
Okay - let's do this as-is and then we can consider changing it later. It's purely an internal API. |
|
I believe apache git is down right now, but this is ready to merge from my perspective. |
Adds onExecutorAdded and onExecutorRemoved events to the SparkListener. This will allow a client to get notified when an executor has been added/removed and provide additional information such as how many vcores it is consuming.
In addition, this commit adds a SparkListenerAdapter to the Java API that provides default implementations to the SparkListener. This is to get around the fact that default implementations for traits don't work in Java. Having Java clients extend SparkListenerAdapter moving forward will prevent breakage in java when we add new events to SparkListener.